Over the past several years, social scientists have noted that the discourse of “diversity” is expanding in use across various social contexts (Berrey 2015). In this project, we are interested in measuring how often diversity and its various metonyms are mentioned in biomedical research over the three decades (1990-2017). Our first working hypothesis is that: * The use of the term ‘diversity’ and its related terminology has increased in biomedical abstracts since 1990. To test Hypothesis 1, we concatenated sets of diversity-related terms into 11 categories (aging, ancestry, cultural, diversity, genetic, minority, population, race/ethnicity, sex/gender, sexuality, and social class) and then used natural language processing to quantify the (1) variation of raw term counts over time and (2) the proportion of publications that a set of terms are used in each year. For a full list of terms in each category, see Supplementary Table 1A at the end of this page.

Raw Growth in Biomedical Abstracts

The first two plots show how these sets of terms have changed in a sample of ~59,000 biomedical abstracts.

Figure 1A shows the growth in the raw word frequencies of diversity-related terms from 1990-2017. This plot shows that the term diversity has consistently grown over time (from only 2 mentions in 1992 to 381 in 2017). This trend offers preliminary support for H1, but is only one small part of how diversity-related terms are channging in use. Overall, most of the terms in this graph have increased, but we see the most notable growth in terms related to sex/gender and aging research. While this growth is in part because scientists now have a larger dictionary of terms to describe sexed/gendered and aging populations, the focus on these topics is not simply a function of a more comprehensive vocabulary. Supplementary Table 1B provides the totals of top terms used in the aging, class, sex/gender, and sexualities categories, showing that, for example, the use of “women” would have the fifth highest total in the final year of this plot. Figure 1A also shows that terms like population, genetic, and cultural have also risen notably over time while terms associated with race/ethnicity, minority, class, sexuality, social class, and ancestry have all grown at a slower pace than the set of diversity terms.

Proportional Growth in Biomedical Abstracts

To normalize for the growth of overall publications, we look at change in the proportion of available abstracts.

While the raw totals suggested that diversity-related terms are rising, the proportions outlined in Figure 1B suggest that the sets of diversity-related terms has changed relatively little over time. For example, the terms associated with aging, ancestry, minority, sex/gender, sexualities, social class, and race/gender have stayed relatively stable over time. Perhaps the most interesting trend is the sex/gender line, which rises from 8% to 12% of articles from 1990 to 1996 before returning to around ~8% in 2005. In contrast, the terms associated with population, genetic, class, and diversity have all steadily increased from their baseline in 1990.

Raw Growth of in Diversity Abstracts

The previous plots demonstrate that some, but not all, diversity terms have generally risen in biomedical research. However, we were also interested in understanding how these terms changed in the context of biomedical research that focuses specifically on the topic of diversity.

Figure 1C one again shows consistent growth in the raw word frequencies from 1990-2017. For example, the terms diversity, population, and genetic exhibit a 12-fold-to-15-fold increase over that time. Sex/gender, race/ethnicity, and aging also increase at similar rates (around a 21 to 22-fold increase) while ancestry, sexuality and most of the other terms display minimal variation over time.

Proportional Growth in Diversity Abstracts

Figure 1D shows proportional growth, telling a slightly different story that Figure 1C, with several nonlinear trends occurring over our period of analysis. First, the term diversity arises in about 75% of abstracts, dropping to 66% at its lowest point in 1996 before increasing to 89% of abstracts in 2017. As a reminder, this sample was generated by searching on the term diversity in abstracts, titles, and keywords. Thus, our sampling method helps explain both why the term diversity is so high relative to others in the dataset as well as why the term does not arise in each and every abstract. Figure 1D also demonstrates that the use of population, genetic, and sex/gender all increase by about 7-8%. On the other hand, the terms cultural, ancestry, and race/ethnicity all follow parabolic patterns, ascending through the mid-2000s before descending to present day. For instance, the term cultural nearly doubles in usage from 1990 to 2006 before dropping to near its baseline in 2017. Similarly, race/ethnicity rises from being present in 3.2% to nearly 10% of abstracts in 2004 before dropping to 6.7% in 2017. Ancestry follows a similar trend rising from 1.4% in 1990 to 4% in 2009 before declining to 2.6% in 2017.

Main Takeaways

Overall, our results provide support for the notion that the use of the term “diversity” is increasing in the biomedical abstracts, but when taken on whole it seems that this growth is quite modest. The raw term frequencies suggest that diversity-related terminology has grown dramatically over time, but the proportional analyses temper these findings by showing that some trends, like sex/gender and race/ethnicity, have actually declined in recent years. Future work will need to examine what social, political and economic factors may have contributed to these declines.

Appendices

Here is a list of the terms in each category analyzed above. You can scroll through each category or use the search tool to see if a term of interest was used in the analyses.

Sensitivity Checks

While most of our terms are specific to each category in which they are listed. However, the term “minority” is one term that we were concerned about producing false positive results. As a sensitivity check, we found that in the large majority of cases minority was being used alongside diversity, racial, sex/gender, or sexuality terms in our dataset. However, it should be noted that there remains a 2-9% false positive rate in any given year when the term minority does not clearly relate to diversity research. While we considered removing these instances from our dataset, we decided to maintain consistency across our datasets noting that there is likely some noise in almost all categories and altering one category is likely to create problems for accurate comparison across the categories in the plot. The abstracts listed in the following table are examples of the false-positives included in the minority category.